After completing the wrangling phase on the dataset and all necessary cleaning done, exploratory analysis was carried out and a couple of insights was generated from the study. it would be interesting to have those insights detailed in this report. The dog stages was analysed and found out that in the Dog stage category, majority of the observation has no dog stage attribute and these has been classified as unidentified, however from those that have dog stage attribute, floofer dog stage has the lowest occurence which stands at 7 followed by puppo with occurence of 22. pupper dog stage ranked the highest in the dog stages which stands at 197 followed by doggo ranked second with occurence of 63. It was also discovered that for the image predictions, first predictions seems to be the most confidence among all the three iamge predictions as puppo dog stage mean confidence in first prediction is the highest, with a value 0f 72%, i.e. 72% confidence in the accuracy of the first image prediction. The third predictions has a low mean confidence, the third predictions are not quite accurate. The majority of the tweets are sent from Twitter for iphone with an occurence of 1890, representing more than 98% of the category of source of tweet. Taking a look at the image predictions it suffice to say that 1425 contain a valid prediction of Dog in the first prediction while that of second image prediction indicates that 1445 contain a valid prediction of Dog. The third image prediction indicates that from all the tweets, 1402 contain a valid prediction of Dog.
The data point of the retweet count and favorite count was categorized by prediction validity plotted on a scatterplot to show the relationship between the three variables, this was acheived by setting the both retweet count (x-axis) and favorite count(y-axis) to a log scale to bettter show the relationship between the multiple variable. It was found out thta the correlation between the retweet count and favorite count is positive, it's quite logical the more a tweet get retweeted the tendency to reach more audience for potential likes, data points are represented categorically by the validity of it's image prediction, the correct prediction as true and wrong prediction as false.
The figure below highlights the correlation between the retweet count and favorite count, in this case a positive correction, it make sense as the more a tweet get retweeted the tendency to reach more audience for potential likes.
The data point of the retweet count and favorite count categorized by prediction validity plotted on a scatter point to show the relationship between the three variables, this figure is depicted below to show the relationship between the variables.
In the image prediction dataframe there are predictions that were not dogs in all the three predictions, we can decide to assess the predictions that were dog only. we filter out the image prediction that predicted something else other than a dog, consequently a word cloud would be appropriate to display the predicted dog breed in word cloud. The breed of dogs from the first prediction is plotted on a word cloud to have a better view of all beeeds of dog predicted from all three predictions. from this figure below for the first prediction Labrador retriver and golden retriever has the highest number of predictions with every other dogs represented with their names.
The image predictions were spot on as a result of the sample gotten below, we can confirm this with the tweet that was archived in the twitter archive dataframe. For prediction in index number 9 we have the tweet as "This is Cassie. She is a college pup. Studying international doggo communication and stick theory for dog".
It was figured out that from the three image predictions there exist predictions that were not a dog and got a category of dog stages, we concentrated on predictions that were valid dogs; the total of valid dog predictions was higher and amongst them were majorly clssified as unidentified dog stage. Pupper dog stage with 138 occurence has the highest dog stage from the first prediction. Conclusively to display the image predictions instead of the url, a IPython.display package was implemeted to display image in html, we can decide to assess the predictions that were dog only. the filtered valid dog dataframe was assessed in this scenario to display predicted images that are dog and not something else.